Grid Data Mining by eans of Learning Classifier Systems and Distributed Model Induction

نویسندگان

  • Manuel Santos
  • Wesley Mathew
  • Henrique Santos
چکیده

This paper introduces a distributed data mining approach suited to grid computing environments based on a supervised learning classifier system. Different methods of merging data mining models generated at different distributed sites are explored. Centralized Data Mining (CDM) is a conventional method of data mining in distributed data. In CDM, data that is stored in distributed locations have to be collected and stored in a central repository before executing the data mining algorithm. CDM method is reliable; however it is expensive (computational, communicational and implementation costs are high). Alternatively, Distributed Data Mining (DDM) approach is economical but it has limitations in combining local models. In DDM, the data mining algorithm has to be executed at each one of the sites to induce a local model. Those induced local models are collected and combined to form a global data mining model. In this work six different tactics are used for constructing the global model in DDM: Generalized Classifier Method (GCM); Specific Classifier Method (SCM); Weighed Classifier Method (WCM); Majority Voting Method (MVM); Model Sampling Method (MSM); and Centralized Training Method (CTM). Preliminary experimental tests were conducted with two synthetic data sets (eleven multiplexer and monks3) and a real world data set (intensive care medicine). The initial results demonstrate that the performance of DDM methods is competitive when compared with the CDM methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supervised Learning Classifier Systems for Grid Data Mining

This paper explores parallel and distributed implementation of the Learning Classifier System (LCS) technology. Specifically, the adaptation of supervised LCS to the grid data mining requisites, using the agent paradigm, is studied. The paper also examines the competitive data mining model induction possibility with homogeneous and heterogeneous data. A distributed framework is proposed using t...

متن کامل

A Grid Data Mining Architecture for Learning Classifier Systems

Recently, there is a growing interest among the researchers and software developers in exploring Learning Classifier System (LCS) implemented in parallel and distributed grid structure for data mining, due to its practical applications. The paper highlights the some aspects of the LCS and studying the competitive data mining model with homogeneous data. In order to establish more efficient dist...

متن کامل

Agent-Based Learning Classifier Systems for Grid Data Mining

Grid Data Mining tools must be able to cope with very large, high dimensional and, frequently heterogeneous data sets that are geographically distributed and stored in different types of repositories, produced from different devices and retrieved through different protocols. This paper presents an agent-based version of a Learning Classifier System. An experimental study was conducted in a comp...

متن کامل

A Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis

Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...

متن کامل

A Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis

Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011